geometric median
Simple and Optimal Sublinear Algorithms for Mean Estimation
We study the sublinear multivariate mean estimation problem in d-dimensional Euclidean space. Specifically, we aim to find the mean µ of a ground point set A, which minimizes the sum of squared Euclidean distances of the points in Ato µ. We first show that a multiplicative (1 + ε) approximation to µ can be found with probability 1 δ using O(ε 1 logδ 1)many independent uniform random samples, and provide a matching lower bound. Furthermore, we give two estimators with optimal sample complexity that can be computed in optimal running time for extracting a suitable approximate mean: 1.
Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers
Fang, Ziyi, Huang, Lingxiao, Yang, Runkai
We study the robust geometric median problem in Euclidean space $\mathbb{R}^d$, with a focus on coreset construction.A coreset is a compact summary of a dataset $P$ of size $n$ that approximates the robust cost for all centers $c$ within a multiplicative error $\varepsilon$. Given an outlier count $m$, we construct a coreset of size $\tilde{O}(\varepsilon^{-2} \cdot \min\{\varepsilon^{-2}, d\})$ when $n \geq 4m$, eliminating the $O(m)$ dependency present in prior work [Huang et al., 2022 & 2023]. For the special case of $d = 1$, we achieve an optimal coreset size of $\tildeΘ(\varepsilon^{-1/2} + \frac{m}{n} \varepsilon^{-1})$, revealing a clear separation from the vanilla case studied in [Huang et al., 2023; Afshani and Chris, 2024]. Our results further extend to robust $(k,z)$-clustering in various metric spaces, eliminating the $m$-dependence under mild data assumptions. The key technical contribution is a novel non-component-wise error analysis, enabling substantial reduction of outlier influence, unlike prior methods that retain them.Empirically, our algorithms consistently outperform existing baselines in terms of size-accuracy tradeoffs and runtime, even when data assumptions are violated across a wide range of datasets.